ggman: R package to create Manhattan plot using ggplot.

Create a basic Manhattan plot

A toy GWAS dataset is made available along with the package. Let’s look at the dimensions, head and tail of the dataset.

library(ggman)

dim(toy.gwas)
## [1] 21751     7
head(toy.gwas)
##     chrom    snp        bp  pvalue         beta     or  gene
## 1.1     1  rs1_0 161003087 0.29540  0.099845335 1.1050 GENE1
## 1.2     1  rs1_5  55542379 0.56240  0.037295785 1.0380 GENE1
## 1.3     1 rs1_10 166549115 0.07658 -0.112049504 0.8940 GENE1
## 1.4     1 rs1_15  78291020 0.61850  0.044973366 1.0460 GENE1
## 1.5     1 rs1_20  40771489 0.58600  0.039220713 1.0400 GENE1
## 1.6     1 rs1_25  30693405 0.89610 -0.008536331 0.9915 GENE1
tail(toy.gwas)
##         chrom      snp       bp  pvalue         beta     or     gene
## Y.21746     Y rs22_973 21755931 0.76200  0.022739487 1.0230 GENE2177
## Y.21747     Y rs22_978 32781818 0.56720 -0.050346374 0.9509 GENE2177
## Y.21748     Y rs22_983 27958741 0.97060  0.002995509 1.0030 GENE2177
## Y.21749     Y rs22_988 26187172 0.05613 -0.121602822 0.8855 GENE2177
## Y.21750     Y rs22_993 23036298 0.82370 -0.014200349 0.9859 GENE2177
## Y.21751     Y rs22_998 31961908 0.17560 -0.167117723 0.8461 GENE2177

To create a Manhattan plot, only the first 4 columns (chrom,snp,bp,pvalue) are required. Specific preformatting of the column classes is not required. The chromosome identifiers can be either numbers (1,2,3..) or strings(“Chr1”,“Chr2”..).

ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue")

Use relative positioning

By enabling the relative positioning, the base pair positions will be scaled in proportion to the real genome positions. Hence, the gaps with no SNPs can be visualized. Be default this is not enabled. To use the relative positions, use the option relative.positions = TRUE

ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", relative.positions = TRUE)

Add labels

Specific set of points in the plot can be annotated by providing a data.frame with only the SNPs those need to be labelled. Let’s take a subset of the main data frame toy.gwas.

#subset only the SNPs with -log10(pvalue) > 8
toy.gwas.sig <- toy.gwas[-log10(toy.gwas$pvalue)>8,]

# dimensions
dim(toy.gwas.sig)
## [1] 4 7
#head 
head(toy.gwas.sig)
##         chrom     snp       bp    pvalue      beta    or     gene
## 5.18986     5 rs02_25 19843813 8.075e-09 0.5641768 1.758 GENE2178
## 5.19009     5 rs02_38 14907898 1.658e-09 0.6195006 1.858 GENE2178
## 5.19074     5 rs02_74  9657902 7.084e-09 0.6119371 1.844 GENE2179
## 5.19089     5 rs02_83  6887869 4.057e-09 0.5988365 1.820 GENE2179

The main layer of Manhattan plot should be saved in a variable and provided subsequently to ggmanLabel function. The name of the columns with snps and labels has to be supplied. In this case, we will label with SNP identifiers.

## save the main layer in a variable
p1 <- ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", relative.positions = TRUE)

##add label
ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp")

Annotations can be just text instead of labels. Use the type= argument.

#add text
ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp", type = "text")

The R package ggrepel is used for annotations. All the arguments that are applicable to geom_text_repel and geom_label_repel can be passed on to ggmanLabel. Lets change the size and colour of the labels.

ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp", colour = "black", size = 2)

Caution: providing the whole main data frame as labelDfm will fill the entire plot with text or might crash the R if the data frame is too big

Highlight a single group of points

The function ggmanHighlight can be used to highlight a single group of points. Be default, while highlighting specific points, the main layer of Manhattan plot is greyed out. We need to supply a vector object with SNP names to highlight. The example file toy.highlights comes along with package.

class(toy.highlights)
## [1] "character"
length(toy.highlights)
## [1] 209
head(toy.highlights)
## [1] "rs02_2"  "rs02_7"  "rs02_12" "rs02_17" "rs02_22" "rs02_27"
ggmanHighlight(p1, highlight = toy.highlights)

Highlight multiple groups of points with a legend

The function ggmanHighlightGroup can be used to highlight multiple groups of points and a legend can be added. Let’s look at the example file toy.highlights.group.

class(toy.highlights.group)
## [1] "data.table" "data.frame"
dim(toy.highlights.group)
## [1] 609   8
head(toy.highlights.group)
##   chrom       snp       bp    pvalue       beta    or     gene  group
## 1    13  rs06_2_M 24226825 0.0794900 0.18148788 1.199 GENE2180 group2
## 2    13  rs06_7_M 23664350 0.0005127 0.36325326 1.438 GENE2180 group2
## 3    13 rs06_12_M 19042292 0.2111000 0.13627762 1.146 GENE2180 group2
## 4    13 rs06_17_M 24586858 0.0193900 0.27459683 1.316 GENE2180 group2
## 5    13 rs06_22_M 20332216 0.4479000 0.09621886 1.101 GENE2180 group2
## 6    13 rs06_27_M 24855237 1.0000000 0.00000000 1.000 GENE2180 group2

Unlike ggmanHighllight, the function ggmanHighlightGroup requires data.frame as an input. One of the column names should be supplied as a grouping variable. The size of the highlighted points can be changed with size argument. The legend title can be specified with legend.title argument.

ggmanHighlightGroup(p1, highlightDfm = toy.highlights.group, snp = "snp", group = "group", size = 0.5, legend.title = "Significant groups")

It is also possible to remove the legend using legend.remove argument.

ggmanHighlightGroup(p1, highlightDfm = toy.highlights.group, snp = "snp", group = "group", size = 0.5, legend.remove = TRUE)

Zoom in to a specific chromosome

The function ggmanZoom can be used to create regional association plot. The chromosome and starting and ending basepair positions should be specified. If only the chromosome is specified, the whole chromosome will be shown. First, let’s see the whole chromosome 1 plot.

ggmanZoom(p1, chromosome = 1)

Next, let’s zoom in to the chromosome 1 region containing genes: GENE21, GENE22 and GENE23.

ggmanZoom(p1, chromosome = 1, start.position = 14209481, end.position = 238131450)

Let’s highlight the genes and add a legend.

ggmanZoom(p1, chromosome = 1, start.position = 14209481, end.position = 238131450, highlight.group = "gene")

Zoom in to a specific region of a chromosome

Highlight points in the zoomed region

Plot Odds ratio

Plot beta

Create an inverted Manhattan plot